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1  Abstract 


The  integral  form  of  the  instrument  transmission  function  for  a  one-dimensional  pixel  in  a  two- 
dimensional  optical  system  is  presented.  The  integral  is  solved  explicitly  in  the  paraxial  ray 
approximation  for  a  single  spatial  Fourier  component  of  a  Lambertian  object.  The  difference  between 
signals  from  adjacent  pixels  is  derived.  It  is  shown  to  have  zero  derivative  with  respect  to  focusing 
error  when  the  focusing  error  is  zero,  i.e.,  it  is  a  weak  source  of  range-from-focus  information. 
Describing  the  instantaneous  focusing  error  as  the  sum  of  a  fixed  offset  and  a  time-domain  sinusoidal 
dither,  the  power  spectrum  of  the  signal  from  each  individual  pixel  is  shown  to  contain  large  first  and 
second  harmonic  terms  for  physically  reasonable  values  of  the  parameters.  The  first  harmonic  signal 
is  proportional  to  the  product  of  the  dither  amplitude  and  the  offset.  The  second  harmonic  signal  is 
proportional  to  the  square  of  the  dither  amplitude  and  is  independent  of  offset.  The  two  coefficients 
are  identical  except  for  an  integral  numerical  factor.  It  is  suggested  that  the  ratio  of  second  harmonic 
to  first  harmonic  signals  is  thus  potentially  a  powerful  measure  of  offset,  i.e.,  of  focusing  error  in  the 
limit  of  zero  dither,  and  thus  of  range-from-focus  pixel-by-pixel.  Extending  the  model  to  three 
dimensions,  removing  the  approximations,  extending  the  model  to  natural  scenes,  and  verifying  and 
implementing  the  results  experimentally  are  outlined  briefly. 


2 


2  Introduction 

Image  focusing  [3]  is  conventionally  regarded  as  a  spatial-domain  activity:  the  focus-controlling 
parameter  (lens-to-sensor  plane  distance  in  a  camera,  focal  length  in  the  eye)  is  presumed  to  be 
adjusted  with  the  goal  of  maximizing  the  amplitudes  of  the  high  spatial  frequency  image  components. 
The  focusing  signal,  i.e.,  these  amplitudes,  is  derived  from  pixei-to-pixel  signal  differences.  The 
focusing  information  available  from  these  differences  is  in  reality  weak.  Thus  most  practical  focusing, 
e.g.,  in  film  and  video  photography,  is  done  indirectly,  without  reference  to  the  image,  by  an  open- 
loop  method  using  a  rangefinder  (e.g.,  a  parallax  based  split-image  method)  arbitrarily  coupled  to  the 
image  distance.  In  humans,  depth  perception  is  known  to  be  derived  from  the  fusion  of  focus  and 
binocular  parallax  cues.  However  the  focusing  cue  is  easily  discounted:  most  people  have  no  trouble 
understanding  stereo  photos  even  though  focus  is  confined  to  the  screen-plane,  while  the  conflicting 
convergence  cues  are  controlled  by  the  offset  between  corresponding  points  in  the  left  and  right  eyes’ 
images  [6], 

In  this  report  I  partially  model  the  image,  i.e.,  the  signal  associated  with  each  pixel  in  the  sensor 
plane,  simply  and  approximately  described  by: 

•  the  object  modeled  as  a  Fourier  amplitude  for  an  arbitrary  spatial  frequency  and  phase; 

•  the  object  distance  z,  the  image  distance  z',  and  their  relationship  via  the  lens  equation; 

•  the  sensor  plane  distance  z"  and  the  pixel  diameter  2  p 
in  two  dimensions,  i.e.,  for  cylindrical  optics. 

The  result  shows  explicitly  why  pixel-to-pixel  signal  differences  are  a  weak  source  of  focusing 
information;  the  derivative  of  the  pixei-to-pixel  signal  difference  with  respect  to  sensor  plane  distance 
z”  is  zero  precisely  at  perfect  focus  z”  =  z',  which  makes  it  operationally  difficult  to  find  the  exact  focus 
using  only  the  spatial  domain  information. 

I  then  examine  the  predictions  that  the  model  makes  in  the  longitudinal  direction.  This  is  conveniently 
imagined  as  an  experiment  in  the  time  domain:  the  signal  from  each  pixel  is  modulated  by  dithering 
the  sensor  plane  distance  as  sin  to/.  The  dominant  AC  signal  appears  at  the  fundamental  dither 
frequency,  and  precisely  in  phase  with  it,  and  the  next  largest  harmonic  is  the  second,  corresponding 
to  a  cos  2m  term.  The  fundamental  signal  is  proportional  to  the  product  of  the  dither  amplitude  and 
the  offset  distance  between  the  image  plane  and  the  sensor  plane,  whereas  the  second  harmonic 
signal  is  proportional  to  the  square  of  the  dither  amplitude,  and  is  independent  of  the  offset  distance. 
The  proportionality  constant  for  the  second  harmonic  is  exactly  one-fourth  the  proportionality  constant 
for  the  fundamental.  I  then  use  this  model  to  show  how  a  pair  of  synchronous  amplifiers  [4]  tuned  to 
sin  cor  and  cos  2m  could  be  used  in  a  ratio  mode  to  detect  focus  precisely,  and  thus  robustly  to 
deduce  range-from-focus  pixel-by-pixel. 


3  Model 

For  geometrical  simplicity,  and  for  the  accompanying  simplicity  in  the  degree  and  limits  of  the 
integrals  representing  the  instrument  transmission  function1,  in  this  introductory  report  I  will  model  a 
cylindrical  rather  than  a  spherical  optical  system.  This  will  affect  the  power  law  behavior  of  some 


’Idealized  instruments  use  point  detectors  to  look  at  point  sources  through  infinitesimal  s»perT,.'n“«s;  real  instruments  report  the 
integral  over  their  own  detector  area  of  signal  received  from  a  finite  source  area  through  finite  sized  apertures.  The  instrument 
transmission  function  is  a  description,  in  terms  of  integrals  over  aperture  and  detector  dimensions,  of  the  signal  that  will  be 
seen  for  any  specified  source  description. 
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variables,  e.g.,  the  "exposure  time"  will  be  linear  rather  than  quadratic  in  the  f-number,  and  some 
numerical  coefficients  may  differ  in  the  two  cases  by  factors  the  order  of  unity,  but  the  essential 
conclusions  should  be  the  same  in  cylindrical  and  spherical  models.  To  keep  notation  simple  and 
physical  concreteness  in  the  forefront,  throughout  the  report  I  will  illustrate  with  geometrically  special 
cases  that  involve  no  loss  of  physical  generality. 

The  model  optical  system  is  depicted  in  Figure  1 .  It  consists  of  a  simple,  thin,  aberration  free  lens  of 
aperture  2 R,  an  object  plane  at  distance  z  measured  to  the  left  of  lens  center,  the  corresponding 
image  plane  at  distance  z  measured  to  the  right  of  lens  center,  and  a  sensor  plane  at  distance  z"  also 
measured  to  the  right  of  lens  center.  Locations  in  the  object,  corresponding  image,  and  sensor 
planes  are  measured  by  x,  xf,  and  x"  respectively,  with  the  positive  direction  of  x  physically  opposite  to 
the  positive  directions  of  x!  and  x".  The  optical  model  is  geometrical,  ignoring  diffraction  entirely. 

The  object  plane  is  characterized  by  a  source  function  W(x,Q)  that  in  this  introduction  I  will  take  as  an 
angularly  Lambertian,  spatially  sinusoidal  grating2  representing  one  Fourier  component  of  the  optical 
power  emitted  or  reflected  by  the  object: 

W(x,  0)  =  VF  COS  (£*+<)>)  COS  0  wcuts-meter~ l- radian* 1 


(1) 


The  constant  k  =  where  X  is  the  spatial  wavelength  of  the  sinusoidal  object  feature.  The  constant 

<t>  is  a  phase  factor  that  describes  the  symmetry  (or  lack  of  symmetry)  of  the  sinusoid  about  the  optical 
axis.  <(>  =  0  is  the  special  case  of  a  sine  function  (antisymmetrical),  and  q  =  1  is  the  special  case  of  a 

cosine  function  (symmetrical).  Direction  angle  0  is  with  respect  to  the  object  plane  normal. 


Figure  1  also  shows  a  typical  ray  connecting  xa",  the  center  of  a  pixel  that  extends  from  x"  -  p  to 
Xq  +  p,  to  the  object  plane,  which  it  intersects  at  location  xa  and  angle  0.  The  power  collected  by  this 
pixel3  is 


s  =  fV  +  p  r+atan— yr~  wO&,.Qr),W,V’))  dQ"  cU" 

Jx~-p  J  fl+JC 


waits 


~o  P  "  ■  ~0 

-a  tarn  — 
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(2) 


As  is  often  the  case  in  modeling  instrument  transmission  functions,  the  key  features  of  the  problem 
reside  in  the  limits  of  the  integrals  that  describe  the  physical  averaging  performed  by  the  various 
apertures  in  the  system. 
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A  uniform  background,  structureless  in  space  and  time,  can  be  superimposed  in  the  reader’s  mind  if  the  negative  values 
sometimes  assumed  by  this  function  are  disconcerting.  The  background  makes  no  net  contribution  to  the  spatial  difference  and 
temporal  derivative  signals  of  interest  in  this  report 

3 Depending  on  the  physical  mechanisms  underlying  transduction,  sensors  may  or  may  not  generate  output  signals 
more-or-less  linear  in  the  incident  optical  power.  In  practice  optical  detectors,  both  electronic  and  photochemical,  when  used 
under  the  conditions  recommended  by  their  manufacturers,  deliver  electrical  voltage  or  developed  optical  density  signals 
whose  amplitudes  are  approximately  linear  in  the  product  of  inadent  optical  powers  and  integrating  time,  i.e.,  these  detectors 
are  inadent  energy  sensitive.  This  functional  relationship  is  not  required  in  any  fundamental  sense:  a  detector  could  in 
principle  respond  to  the  electric  field  strength  (wave  amplitude)  rather  than  to  power  (wave  intensity,  essentially  amplitude 
squared).  The  distinction  fortuitously  evaporates  in  the  usual  case  of  incoherent  illumination:  averaging  over  random  phases 
leaves  rms  power  proportional  to  electric  field  amplitude.  However  for  coherent  (laser)  illumination,  where  this  averaging  does 
not  occur,  the  distinction  is  important.  It  is  also  important  in  sonar  ranging:  typical  modem  acoustic  transducers,  e.g.,  the 
ubiquitous  Polaroid  (5)  product,  are  amplitude  (diaphragm  displacement)  sensitive,  whereas  typical  older  transducers,  e.g.,  the 
carbon  granule  microphones  in  telephone  mouthpieces,  are  (I  suppose)  power  sensitive.  Sonar  ranging  modules  that 
compensate  for  attenuation  with  range  by  using  an  amplifier  whose  gain  is  ramped  linearly  with  time  are  relying  on  the 
amplitude  sensitivity  of  the  detector.  Because  photodetoctors  are  in  practice  intensity  sensitive,  lime  linear  compensation  does 
not  work  with  optical  imaging,  e.g.,  for  a  given  flashbulb  or  strobe  lamp  energy  pulse  the  product  of  f-number  (reciprocal  square 
root  of  exposure)  and  object  distance  is  a  constant,  the  "guide  number*.  ~ 
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From  the  geometry 

x  =  x,‘  +  (z"-z')tan  6"  meters 


n  x  +  *"  +  z"tan  9" 

8  =  atari -  radians 

z 

From  the  simple  lens  equation  [1] 
x  =  px'  meters 


Substituting  these  relationships,  and  making  small  angle,  near  axis  approximations 

S  =  fV  +  P  f—  w  cos  (* t(x"  * (*;  ~  fle2 + <>)  d8"  aLx"  war« 

J-r  °  * 


which  integrates  to 

.  tip  .  *zR(z"-z/) 

4W  oR  Sin  sin - - -  kzx  ~ 

S  =  — 2 - - - -  ...  .  u  COS  ( — A+<t>)  watts 

z  kt  p  iz«(z"-zT  z^ 


*7^  kzR(t--i’) - '  Z^ 

V  rz5 

Substituting  some  convenient  definitions: 

•  kl  x  k' ,  the  object  feature  spatial  frequency  in  the  image  plane; 

•  *  *  A,  the  half -aperture,  approximately  and  exactly  JL.  where  F  is  the  focal  length, 

and  /is  the  conventional  f-number; 

•  k’A  *  r,  the  object  feature  spatial  frequency  in  the  image  plane  times  the  half¬ 
aperture,  and  thus  a  measure  of  the  depth  of  field; 

•  z"  - 1  »  C.  the  offset  between  the  image  and  sensor  planes; 
the  result  is  simply  expressed  as 

S  =  4 woPa  !!2^£  !!^i  cos (*'V+4»  watts  (8) 

This  equation  for  the  instrument  transmission  function  of  a  pixel  is  the  model  in  the  approximation 
stated.  It  describes  the  optical  power  received  by  a  pixel  as  the  product  of  several  physically  sensible 
terms: 

•  the  image  function  W^cos  Ofc'x0"  +  <j>)  corresponding  to  the  source  function  equation  1, 
where  in  the  small  angle  approximation  cos  8  =  1; 

•  the  full  pixel  height  2  p; 

•  the  full  lens  aperture  7A\ 

•  a  transverse  spatial  filter  stn/p; 

kfp 

•  and  a  longitudinal  spatial  filter  s"^„* 

The  features  of  this  result  that  I  want  to  investigate  in  this  report  are  its  transverse  pixel-to-pixel 
differences  and  its  longitudinal  derivatives  (conveniently  modeled  as  the  temporal  frequency 
spectrum  when  C,  undergoes  forced  oscillation).  In  a  future  report  I  will  discuss  the  corresponding 
three  dimensional  model,  integration  over  multiple  spatial  frequencies  (thus  admitting  realistic  object 
descriptions),  and  the  effects  of  removing  the  paraxial  and  other  smallness  approximations. 


4  Example 


For  concreteness  I  will  assign  "typical"  values  to  the  parameters,  and  use  these  values  for  illustration 
and  comparison  throughout  the  rest  of  this  report: 

•  source  function  Wa  =  1000  watt  s-meter*^ -radian*^', 

•  pixel  size  2p  =  13 \xm\ 

•  lens  focal  length  F  =  20 mm,  lens  aperture  2 R  =  10 mm,  thus  f-number  =  2  and  A  =  0.25: 

•  object  distance  z  =  2  meters,  i.e.,  magnification  0.01. 

An  interesting  choice  for  the  object  feature  size  is  the  one  for  which  k'  p  =  so  that  for  the  13  \im  pixel 

size  k' =  2.4166  x  105  meters*1  and  k  =  2.4166  x  103  meters*1.  These  correspond  to  a  spatial 
wavelength  in  the  object  plane  of  2.6 mm  or  26 \m  in  the  image  plane,  i.e.,  exactly  two  pixel  widths:  a 
light  band  falling  on  one  pixel  and  a  dark  band  falling  on  an  adjacent  pixel  maximumize  contrast.  The 
pixelation  is  then  an  optimally  matched  filter  for  the  spatial  wavelength. 

Finally,  k"  =  k'A  =  6.042  x  10 Ameters~x,  corresponding  to  a  longitudinal  wavelength  (for  the  specified 
f-number)  of  104  \un,  twice  the  f-number  times  the  transverse  wavelength.  "Small"  in  the  longitudinal 
direction  means  small  with  respect  to  this  distance. 


5  Differences  Between  Adjacent  Pixels 

Consider  a  pixel  centered  on  axis  at  x”  =  0  and  an  adjacent  pixel  centered  at  x”  -  2p.  By  symmetry, 
when  the  sensor  plane  has  a  pixel  centered  on-axis  (in  contrast  to  having  two  pixels  straddle  the 
axis),  the  most  visible  object  features  will  be  those  for  which  <))  =  0.  The  difference  in  signal  between 
the  on-axis  pixel  and  an  adjacent  pixel  is  then 

as  =  41VopA!!l££  !!!£$  (COSO  -  COS2*'p)  watts  (9) 

which  expands  exactly  to 

AS  =  8-^.  sin3*'p  watts  (10) 

and  for  small  focusing  errors  k”  £  «  1 

AS  -  ■  y  sin 3 k!  p  [l-ll-SL]  watts  (11) 


The  absolute  value  of  the  difference  signal  clearly  has  a  local  transverse  extremum  for  any  integer  n 
satisfying  k’  p  =  11  and  a  local  longitudinal  extremum  for  any  integer  m  satisfying  k "  C  =  — .  Because 

of  the  kf  in  the  denominator  of  the  numerical  coefficient  the  difference  signal  has  a  global  maximum 
when  n  =  1  The  best  contrast  between  adjacent  pixels  is  obtained  when  the  image  plane  feature  size 
has  a  spatial  half-wavelength  equal  to  the  pixel  diameter. 


For  this  best-contrast  condition,  with  the  feature  size  optimally  matched  to  the  pixel  size  k'  p  =  *  but 
perhaps  away  from  precise  focus: 

{6WoA P  sint"? 

K  k"  ? 

When  the  sensor  plane  also  coincides  with  the  image  plane  L  =  0  the  value  of sin  —  S  is  unity  so 

t"  ? 


AS. 


matched 


watts 


(12) 
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A»Si  -  ^  (’3) 

This  is  the  largest  difference  signal  that  can  ever  be  obtained  between  adjacent  pixels  (for  an  object 

^ocTL 

with  a  single  sinusoidal  feature).  It  is  thus  convenient  to  use  5.  =  -™2C —  as  the  unit  relative  to 

which  to  measure  other  signal  powers. 

With  this  notation  the  instrument  transmission  function  is 


*  So  sin  K  p  sin  k"  t, 


COS  (k' x  ''  +  <W  watts 


and  the  difference  between  signals  from  adjacent  pixels  for  object  features  optimally  matched  to 
pixelation  is  exactly 

=  25.  watts  (15) 

and  in  the  limit  of  small  k"  C, 

^tcked  -  25.  [1-^]  watts  (16) 

The  physical  interpretation  is  that  at  focus,  with  the  feature’s  spatial  wavelength  and  phase  matched 
to  the  pixelation,  the  on-axis  pixel  sees  5.,  an  adjacent  pixel  sees  -5.,  and  when  the  sensor  plane 

fails  to  coincide  with  the  image  plane  the  difference  signal  is  attenuated  as  sin  *  *?,  or  approximately 

*  S 

quadratically  in  the  focusing  error. 

The  matched  condition  is  optimal  for  focusing  on  contrast.  Its  sensitivity  to  £  is  given  by  the  derivative 

=  25.*"  watts-meter-  >  (17) 

dt,  k  t,  (*"  Q2 

which  to  first-order  in  £,  and  recalling  k"  =  k'A,  is 


matched 

— 


-2 S  A2 


watls-meter~ 


which,  of  course,  could  have  alternatively  been  obtained  by  directly  differentiating  equation  16. 

Two  points  are  worth  noting: 

•  the  sensitivity  improves  rapidly  with  increasing  aperture  (and  might  be  predicted  to  do  so 
even  more  rapidly  with  spherical  optics); 

•  nevertheless,  the  situation  is  hopeless  at  £  =  0:  the  effect  we  would  use  to  detect  a 
discrepancy  between  the  image  plane  and  the  sensor  plane  has  zero  slope  when  the 
discrepancy  is  zero. 

The  last  point  is  not  so  serious  if  the  goal  of  focusing  is  just  to  obtain  a  sharp  image:  that  the 
derivative  of  the  difference  signal  is  small  simply  says  that  the  endpoint  is  not  critical.  But  if  part  of 
the  goal  of  focusing  is  to  obtain  range-from-focus,  this  result  makes  the  prospects  seem  grim  indeed. 

Returning  to  the  ongoing  numerical  example  5.  =  4.138  mW,  so  =  8.276 s'n„^mW.  Then  how 

k  S 

big  does  the  focusing  error  £  have  to  be  to  make  a  one-bit  difference  in  AS?  Suppose  (to  be 
generous)  that  we  can  digitize  the  difference  to  8-bits  when  AS  is  a  half  scale  signal.  We  want  to 
know  the  value  of  k”C,  that  makes  AS.  differ  from  unity  by  1/128.  The  answer  (obtained  graphically)  is 
*"£*» 0.216,  which  for  *"  =  6.042*  104  meters~x  corresponds  to  a  focusing  error  £  =  3.6|im.  The 
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corresponding  range  error  is  found  by  resubstituting  the  lens  equation  into  its  own  derivative  with 
respect  to  z  \ 

-  =  £  fl-i]  (19) 

2  2  r 

To  a  good  approximation,  the  fractional  range  error  is  the  fractional  focusing  error  times  the  reciprocal 
of  the  magnification,  100  in  this  example.  Thus  a  one  bit  signal  change  that  corresponds  to  a  3.6 \un 
focusing  error  in  a  20mm  focal  length  corresponds  to  1.8%  range  error,  or  36mm  range  error  in 
2  meters.  In  any  but  the  lowest  precision  real  world  application  this  range  error  would  be 
unacceptable. 

The  rest  of  this  report  suggests  a  class  of  data  collection  and  processing  technologies  that  show 
on-paper  promise  of  being  able  to  use  the  longitudinal  structure  of  the  image  to  obtain  range-from- 
focus  with  high  accuracy. 

6  The  Temporal  Dimension 

I  now  investigate  the  signal  observed  in  the  time  domain  from  a  single  pixel  when  the  sensor  plane  is 
both  offset  from  the  image  plane  and  is  driven  in  a  small  amplitude  oscillation: 

C  =  C0+asinoM  meters  (20) 

In  the  absence  of  practical  three  dimensional  image  sensors4,  imagining  a  pixel  plane  in  longitudinal 
oscillation,  especially  with  the  recognition  that  synchronous  detection  can  then  be  employed,  is  a 
useful  expository  tool  as  well  as  a  proposal  for  a  practical  implementation. 


For  a  pixel  on-axis  xa "  =  0,  an  object  symmetrical  about  the  axis  <J>  =  0,  and  object  feature  and  pixel 
sizes  satisfying  A'p  =  5 


S(t)  = 


sin  tc  (<;o  +  asin  ax) 
*”(50  +  asin  ax) 


For  small  £ 

f2  (t  +  a  sin  ax)2 

S(t)  =  SJl  -  - ^ - ]  watts 

which  expands  to 

2 

S(t)  =  S„[l  -  ~  (£a2 + 2  a  sin  cor  +  a2  sin  2  cor)]  watts 

In  units  of  so  we  then  have  for  the  power  spectrum: 

<r  o2 

•  a  DC  term  1- — _ — ,  which  is  just  half  the  adjacent  pixel  difference  equation  16; 

-2*-2r  a 

•  an  AC  term  synchronous  with  the  driving  term  — ^ —  (which  can  alternatively  be 

interpreted  as  a  positive  amplitude  and  a  phase  shift  of  jc  with  respect  to  the  driving 
term); 

•  a  term  whose  time  dependence  corresponds  to  sin  2cor  and  whose  amplitude  is 


4Some  attempts  at  building  processing  layers  behind  sensing  layeis  are  underway  [2],  but  I  am  unaware  of  any  efforts  to 
build  three  dimensional  sensing  lattices. 
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independent  of  C0>  r.e.,  independent  of  focus,  and  is  thus  a  measure  of  the  product  of  all 
the  uncertain  intensity  and  geometry  related  weights. 


Noting  that  sin  2  tor  = 


1  -  cos  2tt* 


,  the  sin  2 cor  amplitude  is  further  interpreted  as 


another  DC  term  of  power 


-(*"  a)2. 


2 

•  an  AC  term  of  power  with  frequency  and  phase  corresponding  to  cos  2cor. 

The  net  power  accounting  is  then: 

•  for  the  DC  term:  P  =  l-(-V 

0  3!  3! 2 

-2*"2a5 

•  for  the  sin  cot  term:  P a  =  — _ — 

•  for  the  cos  2cor  term:  P2(0  =  $L£L 

The  ratio  of  the  signals  at  the  second  harmonic  and  the  fundamental  driving  frequencies  is  ~  which 

becomes  arbitrarily  large  as  approaches  zero,  i.e.,  as  focus  is  achieved.  This  ratio  might  thus 
provide  a  high  sensitivity,  high  accuracy  focusing  criterion. 

How  much  useful  AC  signal  there  is  depends  on  k",  the  depth  of  field  in  the  image  distance  in  relation 
to  the  object  feature  size,  if  the  transverse  matching  condition  kf  p  =  ^  is  satisfied  then  the  longitudinal 

condition  is  k"  =  JL  Recall  that  in  the  ongoing  numerical  example,  corresponding  to  these  conditions 

4p/ 

and  some  typical  parameters,  k" -6.042x10*  meters'1,  or  1-16.55  \im.  For  offset  and  dither 

amplitudes  of  this  order-of-magnitude  the  longitudinal  smallness  approximation  is  valid  to  about  1%. 
As  a  practical  matter,  displacements  of  this  size  could  be  easily  obtained  piezoelectrically.  Then 
continuing  the  example,  if  we  take  a  =  C  =  1  =  —  the  relative  signal  powers  are 

P 0  •  Pm  •  P2(a =  0.75  •  -  0.3333  •  0.0833.  Since  P2m  grows  quadratically  as  a,  even  higher  modulation 
fractions  are  obtainable  within  the  realm  of  plausible  electronically  driven  sensor  plane 
displacements. 

7  Temporal-Longitudinal  vs  Spatial-Transverse  Domains 

Combining  the  results  of  the  two  previous  sections,  the  ratio  of  the  maximum  difference  between 
signals  from  adjacent  pixels  to  the  time  domain  single  pixel  signals  at  DC,  first  harmonic,  and  second 
harmonic  is 


(*”y2_(*"a)21.-2*"2«^.(t"  a)2 

>]t  in  n 


Substituting  C  =  C0+asin  ox  into  the  expression  for  dSmatclud  and  averaging  over  time,  i.e.,  making 
a  DC  measurement,  shows  that  2 Pa  is  effectively  the  same  as 

The  DC  components  and  P0  are  unable  to  distinguish  between  signal  due  to  focus  error  and 

extraneous  signals  (noise)  induced  by  motion  and  vibration  of  the  object  or  the  camera,  changes  in 
illumination,  changes  in  thermal  dark  current  in  the  sensor,  electronic  noise  in  the  detection  system, 
etc.  In  contrast  the  AC  signals  from  individual  pixels  have  several  properties  that  make  them 


'Wj-WVW 


potentially  immune  to  fluctuations  and  noise,  and  thus  sensitive  to  focus  with  a  high  signal-to -noise 
ratio: 

•  ratiometric  measurement,  the  ratio  of  the  first  harmonic  signal  to  the  second  harmonic 
signal  is  — ,  independent  of  illumination,  optical  system  uncertainties,  small  motions, 
vibrations,  fluctuations,  etc; 

•  zero  crossing:  the  first  harmonic  signal  and  the  ratio  signal  have  zero  crossings  at  offset 
^  =  0,  i.e.,  at  exact  focus,  and  this  is  a  desirable  condition  for  detectability; 

•  synchronous  detection:  the  first  and  second  harmonic  signals  are  phase-locked  to  the 
dither;  synchronous  detection  methods5  can  thus  cleanly  extract  these  signals  from  noisy 
environments; 

•  insensitivity  to  flicker  noise:  modulation  (dither)  and  AC  detection  move  the 
measurement  from  near  DC  to  a  higher  frequency  regime  in  which  flicker  (or  I)  noise 

may  be  dramatically  lower. 

Hands-on  experience  in  different  but  analogous  problem  domains  [7]  leads  me  to  expect  that  these 
methods  could  yield  an  advantage  of  several  orders-of-magnitude  with  even  cursory  attention  to  good 
engineering  practice. 


8  Extensions  and  Pitfalls 

Extending  the  model  to  spherical  optics  looks  straightforward,  although  the  additional  integration  over 
an  azimuthal  coordinate  may  involve  some  messy  intermediate  algebra.  I  expect  the  result  will  look 
very  similar  to  the  one  presented  here,  with  the  complication  of  a  transverse  filter  term  for  the 
^-direction,  signal  proportional  to  the  square  of  the  lens  aperture,  and  slightly  different  numerical 
coefficients. 

Removing  the  paraxial  ray  approximations  should  be  straightforward  although  the  more  general 
results  are  usually  rtqrettably  less  revealing  of  the  intuitive  physics  and  geometry. 

Removing  the  smallness  approximation  on  the  arguments  of  the  transverse  and  longitudinal  spatial 
filter  functions  should  similarly  be  straightforward.  The  result  could  bring  some  surprises,  especially 
longitudinally,  since  the  longitudinal  scale  distance  is  typically  rather  small. 

By  far  the  most  important  restriction  to  remove  is  the  description  of  the  object  as  a  simple  sinusoidal 
grating.  When  the  object  space  is  described  as  a  Fourier  integral  over  a  spatial  frequency  continuum 
instead  of  as  a  single  spatial  frequency,  will  the  effect  be  to  wash  out  the  structures  I  am  counting  on 
detecting,  i.e.,  as  many  optical  interference  effects  are  washed  out  when  a  monochromatic  light 
source  is  replaced  by  a  polychromatic  light  source,  or  will  the  pixelation  act  as  a  matched  spatial  filter 
that  selects  exactly  what  is  needed  to  focus  on  surface  texture?  If  the  answer  is  washing  out  rather 
than  selecting  out,  then  my  method  will  be  useful  only  for  artificially  simple  scenes. 

Experimental  verification  can  be  envisioned  as  real-time  and  direct,  by  electromechanically  driving  the 


5AIso  known  in  different  implementations  and  contexts  as  lock-in  amplification,  phase  sensitive  amplification,  and 
synchronous  rectification,  the  methods  are  powerful  for  extracting  small  signals  with  line  spectra  from  overpowering 
backgrounds  with  continuum  spectra.  The  signal  from  a  reference  oscillator  both  modulates  the  source  (in  this  case:  dithers 
the  sensor  plane)  and  in  effect  dials  in  the  center  frequency  of  the  detector  input  filter.  Synchronous  detection  methods  are 
robust  in  that  they  squeeze  out  continuum  noise  by  extreme  narrow-banding,  but  because  of  their  inherent  tracking  ability  they 
exhibit  no  line  spectrum  signal  loss  with  reference  oscillator  frequency  drift. 
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sensor  plane  (or,  more-or-less  equivalently,  the  lens  or  even  the  focal  length)  at  an  audio  frequency 
and  analog  parallel  processing  of  the  signals  from  a  few  pixels.  High  temporal  bandwidth  detectors, 
e.g.,  photodiodes,  would  be  desirable.  Alternatively,  an  indirect,  non-real-time  equivalent  would 
involve  stepping  the  sensor  plane  a  fraction  of  the  longitudinal  scale  distance  between  successive 
frames  from  a  conventional  video  camera,  with  after-the-fact  digital  analysis.  The  direct  real-time 
approach  is  preferred:  it  could  take  full  advantage  of  synchronous  detection,  whereas  the  indirect 
simulation,  with  little  practical  prospect  for  averaging  over  many  cycles,  could  easily  be  disabled  by 
fluctuation  noise. 

A  final  potential  pitfall  is  that  real  photosensors  do  not  necessarily  stop  ail  the  fight  incident  on  them  in 
effectively  zero  thickness.  The  sensor  thickness  is  manifested  as  an  averaging  operation  in  the 
longitudinal  direction  that  attenuates  the  signals  developed  by  the  method  proposed.  With  some 
sensor  types  this  attenuation-by-thickness  might  be  a  fatal  flaw. 
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